Semi-Supervised Clustering with Partial Background Information

نویسندگان

  • Jing Gao
  • Pang-Ning Tan
  • Haibin Cheng
چکیده

Incorporating background knowledge into unsupervised clustering algorithms has been the subject of extensive research in recent years. Nevertheless, existing algorithms implicitly assume that the background information, typically specified in the form of labeled examples or pairwise constraints, has the same feature space as the unlabeled data to be clustered. In this paper, we are concerned with a new problem of incorporating partial background knowledge into clustering, where the labeled examples have moderate overlapping features with the unlabeled data. We formulate this as a constrained optimization problem, and propose two learning algorithms to solve the problem, based on hard and fuzzy clustering methods. An empirical study performed on a variety of real data sets shows that our proposed algorithms improve the quality of clustering results with limited labeled examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Semi-supervised cross-entropy clustering with information bottleneck constraint

In this paper, we propose a semi-supervised clustering method, CECIB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting g...

متن کامل

An Improved Semi-supervised Fuzzy Clustering Algorithm

Semi-supervised clustering is an important method which can improve clustering performance by introducing partial supervised information. This paper mainly studies the semi-supervised fuzzy clustering based on Mahalanobis distance and Gaussian Kernel for SCAPC algorithm. Here, we give a new semi-supervised fuzzy clustering objective function. By solving the optimization problem with above objec...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Mind the Eigen-Gap, or How to Accelerate Semi-Supervised Spectral Learning Algorithms

Semi-supervised learning algorithms commonly incorporate the available background knowledge such that an expression of the derived model’s quality is improved. Depending on the specific context quality can take several forms and can be related to the generalization performance or to a simple clustering coherence measure. Recently, a novel perspective of semi-supervised learning has been put for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006